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ABSTRACT 


The  1948  Selective  Service  Act  established  a  process 
whereby  all  United  States  (US)  military  applicants  take  an 
aptitude  test  to  measure  their  suitability  for  military  job 
specialties.  The  latest  version  of  these  tests,  the  Armed 
Services  Vocational  Aptitude  Battery  (ASVAB) ,  was  introduced 
in  1968.  Approximately  900,000  High  School  students  from 
14,000  US  High  Schools  take  the  ASVAB  test  each  year.  This 
"paper  and  pencil"  test  requires  the  applicant  to  answer 
multiple  choice  questions  (items)  on  a  printed  form.  The 
creation  of  paper  and  pencil  forms  in  one  of  the  ten  test 
topics  is  called  form  assembly.  Form  assembly  consists  of 
picking  20  to  35  items  from  an  item  pool  of  about  300  items 
such  that:  1)  each  item  appears  on  at  most  one  form,-  2)  each 
form's  result  represents  the  applicant's  capability;  and  3) 
each  form  has  the  same  level  of  difficulty.  The  thesis 
models  the  creation  of  paper  and  pencil  forms  as  a  mixed 
integer  linear  goal  program  and  solves  the  problem  both 
optimally  and  heuristically .  Computational  results  for  seven 
ASVAB-Tests  show  both  methods  help  improve  the  form  assembly 
process . 
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EXECUTIVE  SUMMARY 


The  1948  Selective  Service  Act  established  a 
process  whereby  all  United  States  military  applicants  take 
an  aptitude  test  to  measure  their  suitability  for  military- 
job  specialties.  The  latest  version  of  these  tests,  the 
Armed  Services  Vocational  Aptitude  Battery  (ASVAB) ,  was 
introduced  in  1968.  Approximately  900,000  High  School 
students  from  14,000  US  High  Schools  take  the  ASVAB  test 
each  year.  This  "paper  and  pencil"  test  requires  the 
applicant  to  answer  multiple  choice  questions  (items)  on  a 
printed  form.  The  Defense  Manpower  Data  Center,  as  an 
executive  agency  for  the  ASVAB,  is  responsible  for  the 
design,  development  and  creation  of  the  tests.  The  creation 
of  paper  and  pencil  forms  in  one  of  the  ten  test  topics  is 
called  form  assembly .  Form  assembly  consists  of  picking  20 
to  35  items  from  an  item  pool  of  about  300  items  such  that: 
1)  each  item  appears  on  at  most  one  form;  2)  each  form's 
result  represents  the  applicant's  capability;  and  3)  each 
form  has  the  same  level  of  difficulty.  This  thesis  models 
the  creation  of  paper  and  pencil  forms  as  a  mixed  integer 
linear  goal  program.  One  approach  solves  the  program  using 
commercially  available  optimization  software.  A  second  ap¬ 
proach  uses  a  local  search  with  random  restart  heuristic. 
Both  approaches  yield  good  solutions.  Computational  results 
for  the  seven  ASVAB-Tests  show  that  combining  both  methods 
can  improve  the  form  assembly  process .  The  Defense  Manpower 
Data  Center  benefits  from  these  computational  results. 
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I.  INTRODUCTION 


The  1948  Selective  Service  Act  established  a  process 
whereby  all  United  States  (US)  military  applicants  take  an 
aptitude  test  to  measure  their  suitability  for  military  job 
specialties.  The  latest  version  of  these  tests,  the  Armed 
Services  Vocational  Aptitude  Battery  (ASVAB) ,  was  introduced 
in  1968.  A  US  Air  Force  Human  Resources  Laboratory  study  in 
1973  calculated  cost  avoidance  from  these  tests  at  $76.8 
million  per  year  for  enlisted  technical  training  [US  Air 
Force  Human  Resources  Laboratory  1973] . 

The  ASVAB  is  currently  given  in  about  14,000  US  High 
Schools  to  about  900,000  potential  applicants  each  year 
[Defense  Manpower  Data  Center  1992] .  This  "paper  and  pencil" 
test  requires  the  applicant  to  answer  multiple  choice 
questions  (items) .  Each  question  has  one  correct  answer  that 
must  be  selected,  on  average,  from  a  total  of  four  choices. 
The  ASVAB  test  consists  of  ten  different  areas  of  expertise. 
The  categories  —  which  have  between  20  and  35  specific 
items  each  —  are  Arithmetic  Reasoning  (AR) ,  Auto  and  Shop 
(AS)  ,  Coding  Speed  (CS)  ,  Electronics  Infoirmation  (El)  ,  Ge¬ 
neral  Science  (GS) ,  Mechanical  Comprehension  (MC) ,  Mathe¬ 
matical  Knowledge  (MK) ,  Numerical  Operations  (NO) ,  Paragraph 
Comprehension  (PC) ,  and  Word  Knowledge  (WK) . 

The  model  developed  in  this  thesis  addresses  only  seven 
of  the  ten  tests.  The  seven  tests  selected  for  use  in  the 
model's  development  are  selected  because  they  are  similarly 
structured.  That  is,  these  seven  tests  are  configured  in  a 
manner  which  makes  the  choice  of  the  next  eligible  item 
independent  of  the  item  chosen  before.  In  other  words,  there 
is  no  dependency  among  items  from  the  perspective  of  the 
form  assembly  process. 
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The  creation  of  paper  and  pencil  forms  for  each  cate¬ 
gory  is  called  "form  assembly."  Multiple  forms  must  be 
created  in  each  category  so  that  all  applicants  are  not 
tested  using  the  same  form.  "Form  assembly"  consists  of 
picking  20  to  35  items  from  a  pool  of  about  300  items  such 
that:  1)  each  item  appears  on  at  most  one  form;  2)  each 
form's  result  represents  the  applicant's  capability;  and  3) 
each  form  has  the  same  level  of  difficulty.  The  item  pool 
ihself  can  be  split  into  several  item  groups,  where  each 
group,  called  a  taxonomy,  requires  a  certain  number  of  items 
per  form. 

This  thesis  models  the  creation  of  paper  and  pencil 
forms  as  a  mixed  integer  linear  goal  program  and  solves  the 
problem  both  optimally  and  heuristically . 

A.  TEST  THEORY  BACKGROUND 

The  measurement  of  a  person's  ability  or  skill  level 
(denoted  0)  is  commonly  discretized  into  100  intervals,  so 
that  each  level  can  be  expressed  as  a  percentage.  These 
intervals  are  then  called  percentiles  of  the  ability.  The 
skill  level  distribution  over  the  potential  applicant  po¬ 
pulation  is  approximately  normal  allowing  percentiles  to  be 
ranked  from  -3o  to  +3o  around  a  mean.  A  reasonable 
assuiT5)tion  is  that  the  probability  p  of  answering  an  item 
correctly  increases  as  the  percentile  increases  with  p  ap¬ 
proaching  1  as  the  percentile  goes  to  +30.  Hence,  this  pro¬ 
bability  can  be  represented  by  a  logistic  function,  referred 
to  as  an  item  response  curve.  A  common  model  [Lord  1980] 
uses  a  three -parameter  logistic  function  like  the  one 
adapted  from  Lord  and  Novick  [1968]  (Figure  1)  with 
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p(0) 


c  + 


1  -  c 

1  +  ' 

Parameter  a  is  a  proportionality  factor  for  the  slope 
at  the  inflection  point .  It  represents  the  discriminating 
power;  in  other  words,  how  capable  an  item  is  to  distinguish 
between  applicants.  Figure  2  shows  an  example  where  item  1 
has  a  steeper  curve  in  the  percentile  range  (50,60)  than 
item  2  and  therefore  provides  greater  discrimination  between 
individuals  at  percentiles  50  and  60. 


Figure  1:  Parameters  of  the  Logistic  Function. 

The  logistic  function  represents  the  probability  of  answering  an 
item  correctly  and  is  defined  with  parameters  (a,  b  and  c) .  Parameter  a 
is  proportional  to  the  slope  at  the  inflection  point:  slope  =  .425a(l- 
c) .  Parameter  b  indicates  an  item's  difficulty  level  by  defining  the  po¬ 
sition  of  an  item's  curve  along  the  ability  scale  6.  Parameter  c  indi¬ 
cates  the  guessing  parameter  [Lord  1980] . 

Parameter  b  indicates  an  item' s  difficulty  level  by 
defining  the  position  of  an  item's  curve • along  the  ability 
scale  0  (i.e.,  when  the  percentile  ©i  corresponding  to  the 
probability  of  a  correct  answer  is  0.5) . 

Parameter  c  indicates  the  guessing  parameter  or  the 
probability  of  answering  an  item  correctly  given  an  ability 
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Parameter  c  indicates  the  guessing  parameter  or  the 
probability  of  answering  an  item  correctly  given  an  ability 
falling  greater  than  3o  below  the  mean  [Lord  1980] .  This 
guessing  parameter  does  not  necessarily  reflect  the  pro¬ 
bability  to  select  one  correct  answer  from  a  certain  number 
of  possible  choices . 


Figure  2:  Example  of  the  Discriminating  Power. 

Figure  2  provides  an  example  of  the  discriminating  power  of  two  items 
for  two  applicants  with  percentiles  50  and  60.  Item  l  has  a  steeper 
curve  in  the  percentile  range  (50/60)  than  item  2  and  therefore  provides 
greater  discrimination  between  individuals  at  percentiles  50  and  60. 

In  practice,  1,000  to  10,000  applicants  pretest  an  item 
and  the  parameters  a,  b  and  c  are  estimated  from  the  re- 
sults  •  From  the  item  response  curve,  an  item  information 
curve  is  determined  (Figure  3).  The  item  information  curve 
describes  the  potential  information  contribution  of  an  item 
to  a  test  form  at  each  percentile.  These  item  information 
curves  comprise  the  bulk  of  the  data  for  this  thesis. 

These  item  information  curves  are  independent  and  ad¬ 
ditive  when  it  is  assumed  that  the  information  contribution 
of  an  item  to  the  whole  form  does  not  depend  on  other  items 
included  on  the  form  [Lord  1980].  Therefore  all  of  a  form's 
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item  information  curves  can  be  added  to  get  an  overall  in¬ 
formation  curve.  This  overall  information  curve  is  commonly 
denoted  as  the  precision  of  the  form. 


Figure  3:  Item  Information  Curves. 

Figure  3  displays  examples  of  different  item  information  curves.  These 
curves  describe  the  potential  information  contribution  of  an  item  to  a 
test  form  at  each  percentile. 

Empirical  research  and  testing  has  produced  a  "re¬ 
ference  curve"  for  each  test  representing  the  desired 
information  distribution  over  a  form's  percentiles.  Since 
the  establishment  of  a  standard  reference  curve  in  1980, 
some  item  pools  have  changed  and  it  is  now  possible  to 
provide  forms  with  "better"  information  curves  than  the 
reference  curve.  In  such  cases,  these  curves  are  the  new 
desired  information  distribution  but  cannot  be  called  re¬ 
ference  curves  for  historic  purity.  Regardless,  in  this 
thesis,  we  refer  to  the  preferred  curve  as  the  "goal  curve." 
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B.  OUTLINE 


Chapter  II  provides  information  about  research  related 
to  this  thesis.  Chapter  III  formulates  the  form  assembly- 
process  as  a  mixed  integer  linear  goal  programming  problem 
and  discusses  a  heuristic  to  solve  it.  Chapter  IV  provides 
results  obtained  from  solving  the  formulation  using  a 
heuristic  and  the  General  Algebraic  Modeling  System  (GAMS) 
[Brooke,  Kendrick  and  Meeraus  1992]  with  the  solver  OSL 
[GAMS  1995] .  Chapter  V  compares  the  two  solution  methods  and 
presents  conclusions . 
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II.  RELATED  RESEARCH 


The  bulk  of  the  literature  on  aptitude  and  ability- 
tests  involves  the  concept  of  item  validity  [Lord  1980] . 
^^lidity  in  this  case  is  taken  to  be  the  extent  to  which  a 
test  score  actually  predicts  future  performance.  Toquam, 
Corpe  and  Dunette  [1991]  review  more  than  10,000  articles 
related  to  validity  as  it  pertains  to  ability  tests.  Their 
literature  review  highlights  the  significant  effort 
associated  with  this  issue.  As  pertains  specifically  to  the 
ASVAB,  Maier  and  Truss  [1985]  give  an  example  of  that  test's 
predictability.  In  this  study,  the  authors  demonstrate  that 
performance  on  the  ASVAB  tests  is  statistically  related  to 
training  outcome  measures  of  various  US  Marine  Corps 
technical  schools. 

The  present  study  uses  data  provided  by  the  Defense 
Manpower  Data  Center  (DMDC) .  Again,  as  explained  on  page 
four,  these  data  consist  of  roughly  300  item  information 
curves,  each  curve  derived  by  standard  statistical  pro¬ 
cedures  [Lord  and  Novick  1968]  from  item  response  curves. 
These  data  are  assumed  to  be  representative  with  respect  to 
the  validity  issue.  Accordingly,  the  DMDC  data  used  in  the 
present  study  are  used  siirply  to  demonstrate  a  methodo- 
^ogical  approach  to  "form  assembly."  They  are  not  being  used 
to  demonstrate  their  predictive  validity. 

Unlike  the  validity  literature,  there  exist  only  a  few 
publications  addressing  assembly  or  construction  of  ability 
aptitude  tests.  Berger,  Gupta  and  Berger  [1988]  present 
the  construction  of  Form  P  for  the  Air  Force  Officer 
Qualifying  Test  (AFOQT) .  They  develop  two  forms  of  the  test 
by  adding  new  items  to  an  old  form.  The  objective  is  to 
construct  two  new  forms  which  are  equivalent  and  parallel  to 
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the  original  form.  "Equivalence"  means  that  each  form  has 
the  same  information  content.  "Parallel"  means  that  the 
outcome  of  the  test  is  independent  of  the  form  the  applicant 
has  taken.  Their  approach  is  heuristic.  The  heuristic  is 
straight  forward.  They  select  items  with  the  most  discrimi¬ 
nating  power  from  the  old  form;  check  them  against  new 
items;  and  replace  old  items  with  new  items  that  provide  the 
best  match;  that  is,  a  match  which  produces  the  smallest  in¬ 
formation  differences  between  the  old  and  the  new  form. 

Baker  and  Wall  [1996]  use  a  form  assembly  similar  to 
the  heuristic  approach  presented  in  this  thesis.  They  focus 
on  a  statistical  analysis  of  the  Interest  Finder  Test,  a 
test  to  help  students  explore  their  occupational  and  career 
interests  [DMDC  1992] .  They  describe  form  assembly  as  con¬ 
sisting  of  two  stages.  The  first  stage  screens  the  item  pool 
and  the  second  stage  uses  a  heuristic  algorithm  to  assign 
items  to  the  form.  Their  heuristic  selects  an  initial  group 
of  items  and  exchanges  items  when  replacement  considerations 
improve  the  form.  The  objective  function  is  a  weighted 
function  that  minimizes  statistical  differences  between  the 
current  form  and  a  desired  form.  These  statistical  dif¬ 
ferences  are  essentially  the  mean  and  standard  deviation  of 
scaling  parameters  for  the  test.  The  actual  criteria  for  the 
initial  item  selection  and  results  with  respect  to  form 
assembly  are  beyond  the  scope  of  this  paper. 

In  summary,  the  literature  review  did  not  reveal  prior 
atten^ts  to  use  optimization  in  form  assembly  and  only  pro¬ 
vided  scant  references  to  the  use  of  heuristic  approaches. 
The  next  chapter  discusses  the  optimization  and  heuristic 
approaches . 
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III.  OPTIMIZATION  MODEL  AND  HEURISTIC 


A.  OPTIMIZATION  MODEL 


The  form  assembly  problem  can  be  formulated  as  a  mixed 
integer  linear  goal  programming  problem  (see  Charnes  and 
Cooper  [1961]  for  a  discussion  of  goal  programming)  con¬ 
sisting  of  two  goals.  One  goal  is  to  assemble  forms  so  each 
form's  information  curve  is  as  close  as  possible  to  the  goal 
curve.  The  second  goal  is  to  make  each  form's  information 
curve  as  "parallel"  as  possible  to  one  another.  The 
"parallel"  goal  seeks  an  exam,  where  results  are  independent 
of  the  form  the  applicant  has  taken.  An  exam  with  all  forms 
exactly  matching  the  goal  curve  would  simultaneously  satisfy 
both  goals  but  this  is  typically  not  possible.  The  parallel 
goal  therefore  encourages  each  forim  to  be  close  to  the  goal 
curve . 

We  implement  the  first  goal  by  allowing  the  deviation 
from  the  reference  curve  to  vary  in  groups  where  deviation 
within  the  group  has  the  same  penalty  per  unit  and  groups 
closer  to  the  goal  curve  have  a  smaller  penalty  per  unit . 
Figure  4  provides  an  example  of  the  penalty  groups .  Any 
vertical  deviation  between  the  goal  curve  and  form  curve  is 
penalized. 
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percentiles  68 


Figure  4:  Penalty  Groups. 

This  figure  displays  at  percentile  68  how  deviation  from  the  goal  curve 
can  be  measured  in  different  groups.  The  vertical  distance  Al  would  be 
penalized  per  unit  with  the  penalty  for  group  1  for  those  units  of  Al 
within  group  1  and  with  the  penalty  per  unit  for  group  2  for  those  units 
of  Al  within  group  2 .  Since  it  is  desired  to  be  as  close  to  the  goal 
curve  as  possible,  group  I's  penalty  per  unit  would  be  less  than  group 
2's  penalty  per  unit. 

The  formulation  follows . 

Indices ; 

i  :  item  from  the  item  pool; 
p  :  percentile (ability  level); 
f  :  form  to  be  assembled  (1,2,..,F); 
t  :  taxonomy ( 1 , 2 , . . , T) ;  and 
g  :  penalty  group. 

Data; 

CATg  the  maximum  deviation  between  a  form  and 

the  goal  curve  in  group  g; 

INFip  information  value  of  item  i  at  percentile  p; 


10 


NITEMf.  the  required  number  of  items  in  taxonomy  t; 
PARAWEI  weight  that  combines  the  two  goals; 

PENALTYg  penalty  per  unit  deviation  within  group  g; 
and 

SHAPEp  the  information  value  for  the  goal  curve  at 
percentile  p. 


Variah]  « 

Xif 

PYpfg 


^Ypfg 


DelpluSf 

Delnegf 


1,  if  item  i  is  used  on  form  f; 

deviation  above  the  desired  shape  in  group  g 

at  percentile  p  on  form  f; 

deviation  below  the  desired  shape  in  group  g 
at  percentile  p  on  form  f; 
the  total  information  form  l  contains  that 
exceeds  form  f;  and 

the  total  information  form  f  contains  that 
exceeds  form  l . 


Formulation ; 


min 

2  Z  Z  penalty,  ■  (py,„  +  ny,„) 

p  f  g 

+ 

PARAWEI  •  ^  (DelpluSf  +  Delnegj) 

f  >1 

(1) 

pypfg 

g 

^  -  SHAPEp 

i 

Vp,  f 

(2) 

Z^ypfg 

g 

>  -  •  Xif  +  SHAPEp 

i 

Vp,f 

(3) 

= 

:  NITEM^ 

Vf ,  t 

(4) 

i 
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Vi  (5) 


2:1  INF^p  •  -  II  INEVp  •  Xi,  ■  Vf>l  (6) 

ip  ip 

=  DelpluSf  -  Delneg^ 

0  ^  PYpfg  ^  CATg  Vp,f,g.  (7) 

0  —  ^Ypfg  —  CATg  Vp,f,g  (8) 

Xif  binarY  Vi,f 

DelpluSf,  Delnegf  >0  Vf. 


The  first  component  of  the  objective  function, 

Z  Z  S  PENALTY^  •  , 

p  f  g 

minimizes  the  vertical  distances  (weighted  deviation)  bet¬ 
ween  the  goal  curve  and  the  assembled  forms.  The  second 
component , 

PARAWEI  •  X  (DelpluSf  +  Delneg^) 
encourages  forms  to  have  the  same  information.  A  second 
component  having  value  zero  does  not  necessarily  imply 
parallel  forms  since  the  vertical  distances  at  percentile  p 
from  form  1  to  form  f  can  have  positive  or  negative  signs 
depending  on  whether  form  f  is  above  or  below  form  1.  These 
positive  and  negative  distances  can  sum  up  to  zero  producing 
two  forms  where  DelpluSf  =  Delnegf  =  0.  Nevertheless,  the 
second  component  has  empirically  produced  parallel  forms  and 
requires  only  F-1  additional  constraints.  Constraints  (2) 
and  (3)  determine  the  positive  and  negative  deviation  at 
each  percentile  between  the  assembled  forms  and  the  goal 
curve.  Constraint  (4)  ensures  the  required  number  of  items 
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per  taxonomy  is  satisfied.  Constraint  (5)  ensures  that  each 
item  is  used  at  most  once.  Constraint  (6)  determines  the 
total  information  difference  between  form  l  and  other  forms. 
Constraints  (7)  and  (8)  bound  the  positive  and  negative  de¬ 
viations  - 

B.  HEURISTIC  APPROACH 

Solving  the  previous  problem  optimally  has  taken 
extensive  computation  time  as  shown  in  the  next  chapter.  To 
provide  solutions  quickly  a  local  search  with  random  restart 
heuristic  (e.g.,  [Papadimitriou  and  Steiglitz  1982])  is  de¬ 
veloped. 

The  main  objectives  for  the  heuristic  are  to  quickly 
complete  one  assembly  and  to  quickly  evaluate  small 
variations  to  the  assembly.  The  heuristic  uses  only  integer 
arithmetic  within  efficient  code  to  help  improve  per¬ 
formance  . 

The  heuristic  starts  by  dividing  the  item  pool  into 
arrays  of  items  where  each  array  corresponds  to  a  taxonomy. 
These  sub -item  pools  are  eligible  sets  (ESt)  for  each 
taxonomy . 

Each  form  consists  of  vectors  for  each  taxonomy 
(Assigutf)  .  The  algorithm  consists  of  three  main  procedures 
(Figure  5)  :  fill_initial_form;  do_swap;  and  improve_pa- 
r all  el . 

Figure  6  displays  the  pseudocode  for  the  procedure 
fill_initial_forms.  A  random  number  generator  [Lewis, 
Goodmann  and  Miller  1969]  is  used  to  assemble  the  initial 
forms  subject  to  all  constraints. 
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- ► 

fillinitialforms 

3 

s 

(U 

CO 

CO 

03 

(D 

C 

- ► 

doswap 

O 

miprove_parallel 

Until  sentinel 


Figure  5:  Main  Procedures  of  the  Heuristic. 

This  figure  shows  the  main  procedures  for  the  heuristic  algorithm.  A 
loop  over  one  assembly  of  all  forms  runs  as  often  as  the  user  has 
chosen.  The  best  assembly  is  the  result. 


1  Assigiitf  <-0;  initialize  ESt  (assume  |  ESt  |  >F*NITEMt} 

2  for  f  =  1  to  F 

,  3  for  t  =  1  to  T 

4  while  lAssigntfl  <  NITEMt 

5  randomly  select  item  i  from  ES^ 

6  Assign^f  <-  Assign^f  {i} 

7  ESt  ESt  -  {i} 

8  end 

9  end 
10  end 

Figure  6:  The  Pseudocode  for  the  Procedure  fill_initial_forms. 

This  figure  shows  how  the  heuristic  randomly  assembles  the  initial 
forms.  The  indices  and  variables  match  those  from  the  optimization 
model.  Assign^f  contains  items  on  form  f  in  taxonomy  t.  ESt  contains  all 
items  in  taxonomy  t  not  currently  used  on  any  form. 

The  procedure  do_swap  defines  a  swap  as  the  exchange  of 
an  item  from  a  form  (i^ut  ^  Assigntf)  with  an  item  from  the 


14 


appropriate  eligible  set  (1^^ 
pseudocode  for  this  procedure. 


e  ESt)  .  Figure  7  shows  the 


1 

improve  <-  1 

2 

while  improve  >  0 

3 

improve  <-  0 

4 

for  t  =  1  to  T 

5 

for  f  =  1  to  F 

6 

for  each  item  i^ut  e  Assigntf 

7 

sofar  •<—  ObjFctValue_old 

8 

Assigntf  <-  Assigntf  -  {i^ut} 

9 

for  each  item  (if^,)  e  ESt 

10 

Assigntf  <-  Assigntf  +  {if^} 

11 

calculate  ObjFctVal  new 

12 

if  Obj FctVal_neWf  <  sofar  (improvement) 

13 

sofar  <-  Ob  j  FctVal_new 

14 

candidate  =  if^ 

15 

end  if 

16 

Assigntf  <-  Assigntf  -  {if^} 

17 

end 

18 

if  sofar  <  Ob j FctValue_old 

19 

swap  candidate  with  i^^t 

20 

update  involved  curves 

21 

improve  <—  improve  +1 

22 

end  if 

23 

end 

24 

end 

25 

end 

26 

end  while 

Figure  7:  The  Pseudocode  for  the  Procedure  do_swap. 

This  figure  shows  how  items  swapping  improves  forms.  Ob j FctValue_old  is 
the  sum  of  all  deviation  between  form  f  and  the  goal  curve  before 
potentially  swapping  an  item  and  ObjFct_new  is  after  a  potential  swap. 
The  procedure  repeats  until  no  swap  yields  a  decrease  to  the  objective 
function  of  any  form. 


The  objective  function  value  measuring  the  effectiveness  of 
the  swap  is  the  sum  of  all  deviations  between  form  f  and  the 


15 


goal  curve.  Improvement,  as  it  is  used  in  this  context  means 
a  decrease  of  the  objective  function  value,  caused  by  swap¬ 
ping  an  item.  This  procedure  runs  through  all  forms  and 
eligible  sets  and  checks  whether  a  swap  yields  improvement. 
The  while-loop  repeats  as  long  as  at  least  one  improvement 
is  found  across  all  forms  and  eligible  sets. 

To  increase  the  speed  of  the  algorithm  a  baseline  for 
checking  the  swaps  is  used.  A  baseline  in  this  context  is 
the  sum  of  all  item  information  curves  currently  assembled 
without  the  item  considered  for  exchange  (iout)  •  Within  the 
pseudocode  of  Figure  7,  the  baseline  can  be  calculated  after 
step  8;  and  doing  so  reduces  the  computational  effort  needed 
to  determine  the  new  objective  function  value  in  step  11. 
Only  the  100  information  values  of  item  i^^  have  to  be  added 
to  the  baseline  instead  of  summing  over  all  items  currently 
assigned.  The  swap  is  executed  after  all  items  of  the 
eligible  set  have  been  examined  with  that  item  that  gives 
the  most  improvement  (candidate) . 

The  procedure  improvejparallel  checks  if  swapping  items 
between  forms  can  improve  the  forms .  The  procedure  starts  by 
finding  the  form  with  the  smallest  sum  of  all  deviations 
from  the  goal  curve  sofar.  This  best  form  is  the  one  with 
which  the  other  forms  have  to  be  aligned.  Figure  8  displays 
the  pseudocode  for  the  procedure  improve_parallel .  At  this 
stadium,  the  heuristic  does  not  allow  the  objective  function 
to  increase. 

An  improving  swap  between  forms  happens  only  after  all 
items  within  a  taxonomy  on  all  forms  have  been  compared  with 
an  item  on  the  best  form.  The  calculation  of  the  curves  uses 
the  baseline  principle  again.  Improvejparallel  terminates 
when  no  item  is  swapped  on  any  form. 
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Figure  8:  The  Pseudocode  for  the  Procedure  improve jparal lei. 

This  figure  shows  swaps  allowed  between  forms.  A  swap/  given  it  improves 
the  objective  function  value,  occurs  after  one  item  on  the  best  form  has 
been  compared  with  all  other  assigned  items  on  the  other  forms. 
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IV.  COMPUTATIONAL  RESULTS 


The  task  is  to  assemble  forms  for  seven  different 
tests:  Arithmetic  Reasoning  (AR)  ,  Auto  and  Shop  (AS), 
Electronics  Information  (El),  General  Science  (GS) , 
Mechanical  Comprehension  (MC) ,  Mathematical  Knowledge  (MK)  , 
and  Word  Knowledge  (WK)  .  Table  1  lists  the  test  speci¬ 
fications  . 


Test 

Item  Pool  size 

Forms 

needed 

Items  on  form 

Taxonomies 

AR 

338 

2 

30 

5 

AS 

196 

2 

25 

2 

El 

190 

2 

20 

4 

GS 

313 

2 

25 

12 

MC 

296 

4 

25 

6 

MK 

327 

4 

25 

5 

wk' 

276 

2 

35 

2 

Table  1 :  Test  Requirements  and  Item  Pools . 

This  table  lists  the  specifications  for  each  of  the  tests.  For  example, 
the  AR-Test  requires  the  creation  of  two  forms  each  having  30  items.  The 
30  items,  falling  into  five  taxonomies,  must  be  selected  from  an  item 
pool  of  338  items. 

A.  OPTIMIZATION  PARAMETER  SETTINGS 

The  optimization  model  formulated  in  the  previous 
chapter  requires  the  specification  of  a  number  of  para¬ 
meters.  A  summary  sheet  for  each  test  contains  results  as 
well  as  parameter  settings.  We  use  the  AR-Test  as  an 
example . 
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Figure  9  shows  the  implemented  objective  function.  All 
values  were  empirically  developed.  The  penalties  for  the 
unbounded  variables,  py4  and  ny4,  are  100.  Other  values  are: 
CATi  =  0.01;  CATj  =  0.05;  CAT3  =  0.10; 

penalty^  =  0.00001;  penaltys  =  l.OO;  penaltyj  =  5.00;  and 

PARAWEI  =  25. 


<  II  100  •  py4pf  +  100  •  ny4pf 

f  P 

+  0.00001  •  pylp,  +  1  •  py2p£  +  5  •  py3pf 

+  0.00001  •  nylpj  +  1  •  ny2pf  +  5  •  ny3pf) 

+  25  *  ^  (DelpluSj  -  Delneg^) 


Figure  9:  The  objective  function  parameters  for  the  optimization  model. 
This  figure  shows  the  objective  function  implemented  in  GAMS  for  the  AR- 
Test .  It  measures  the  overall  distance  between  the  forms  and  the  goal 
curve  at  each  percentile.  The  pys  and  nys  are  the  deviation  variables. 
25  *  2(Delplus  -  Delneg)  is  the  subgoal  to  encourage  parallel  forms. 

We  use  only  upper  bounds  on  the  deviation  variables 
(CATg)  for  groups  1,  2  and  3.  The  following  pages  display 

for  each  test  the  bounds  for  the  penalty  groups  and  the 
weights  for  the  subgoal . 

B.  OPTIMIZATION  RESULTS 

This  section  shows  results  for  the  assembled  tests.  The 
integrality  gap  provided  is  the  difference  between  the  best 
integer  solution  identified  and  a  lower  bound  on  the 
solution,  expressed  as  a  percentage  of  the  lower  bound.  The 
results  for  all  tests  are  presented  in  alphabetical  order. 
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Table  2  suTtimarizes  the  numerical  results  obtained.  Figures 
10  to  16  show  graphical  results. 


Table  2:  Numerical  Results  of  the  Optimization  Assembly. 

Table  2  summarizes  all  numerical  results  for  tests  assembled  using 
optimization,  where  objfctvalue  =  Objective  Function  Value.  The  inte- 
9^3lity  gap  provided  is  the  difference  between  the  best  integer  solution 
identified  and  a  lower  bound  on  the  solution,  expressed  as  a  percentage 
of  the  lower  bound  (e.g.,  ®={@-0)/0). 

Model  results  come  from  an  IBM  RS6000  Model  590 
workstation  using  GAMS  and  the  OSL  solver.  The  model  size 
varies,  primarily  according  to  the  number  of  forms  and  the 
cardinality  of  the  item  pool .  The  approximate  size  of  the 
largest  model,  MK-Test,  is  shown  below: 

number  of  constraints:  1,150; 

number  of  continuous  variables:  4,500; 

number  of  binary  variables:  1,300;  and 

number  of  non-zero  elements:  ‘  250,000. 
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AS  -  Test  (Auto  and  Shop) ; 


General  Requirements: 
forms :  2 ; 


items:  25  each;  and 

taxonomies:  2  (11,  13  items  in  taxonomy  1  and  2) . 
Settings : 

CAT-valueS:  0.05,  0.1,  0.5; 

penalties:  O.OOOOl,  1,  5; 


PARAWEI:  25;  and 
item  pool:  196  items. 

Numerical  Results: 

objective  function  value  (lower  bound):  2,788.00; 

objective  function  value  (best  solution):  2,862.73; 
integrality  gap:  2.7  %;  and 
runtime  (seconds):  215. 


Figure  11:  Graphical  Results  for  the  AS-Test. 

This  figure  shows  results  obtained  for  the  AS-Test  with  information  on 
the  vertical  axis  and  the  percentiles  on  the  horizontal  axis.  Form  1  and 
form  2  are  the  information  curves  for  each  form. 
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El  -  Test  (Electronics  Information) ; 


General  Requirements: 
forms :  2 ; 


items:  20  each;  and 

taxonomies:  4  (10,4,2,4  items  in  taxonomy  1  to  4). 
Settings : 

CAT-valueS:  0.05,  0.1,  0.7; 

penalties:  0.00001,  1,  10; 


PARAWEI:  3;  and 
item  pool:  190  items. 

Numerical  Results: 

objective  function  value  (lower  bound):  9,489.65; 

objective  function  value  (best  solution):  9,561.94; 
integrality  gap:  l.O  %;  and 
runtime  (seconds):  17. 

Graphical  Results:  Figure  12  below. 


Figure  12:  Graphical  Results  for  the  El-Test. 

This  figure  shows  results  obtained  for  the  El-Test  with  information  on 
the  vertical  axis  and  the  percentiles  on  the  horizontal  axis.  Form  1  and 
form  2  are  the  information  curves  for  each  form. 
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GS  -  Test  (General  Science) ; 


General  Requirements: 
forms :  2 ; 


items:  25  each;  and 

taxonomies :  12  (3, 3, 4, 2, 2, 3, 1,2, 2, 1,1,1). 
Settings : 

CAT- values:  0.05,  0.1,  0.5; 

penalties:  1,  10,  100; 


PARAWEI:  100;  and 
item  pool:  313  items. 

Numerical  Results: 

objective  function  value  (lower  bound) : 
objective  function  value  (best  solution) 
integrality  gap:  4.2  %;  and 
runtime  (seconds):  312. 

Graphical  Results:  Figure  13  below. _ 


8,095.66; 

8,433.03; 


Figure  13:  Graphical  Results  for  GS-Test. 

This  figure  shows  results  obtained  for  the  GS-Test  with  information  on 
the  vertical  axis  and  the  percentiles  on  the  horizontal  axis.  Form  1  and 
form  2  are  the  information  curves  for  each  form. 
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General  Requirements : 
forms :  2 ; 

items:  25  each;  and 

taxonomies:  6  (11,2,2,2,4,4  items  in  taxonomy  1  to  6). 
Settings : 

CAT-valueS:  0.01,  0.05,  0.1; 

penalties:  0.00001,  1,  5; 

PARAWEI:  300;  and 
item  pool:  296  items. 

Numerical  Results: 

objective  function  value  (lower  bound):  125.04; 

objective  function  value  (best  solution):  1,187.83; 
integrality  gap:  850  %,*  and 
runtime  (seconds):  50,000  (13.8  hours). 

Graphical  Results:  Figure  14  below. 


Figure  14:  Graphical  Results  for  the  MC-Test. 

This  figure  shows  results  obtained  for  the  MC-Test  with  information  on 
the  vertical  axis  and  the  percentiles  on  the  horizontal  axis.  Form  1  and 
form  2  are  the  information  curves  for  each  form. 
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MK  -  Test  (Mathematical  Knowledge 

General  Requirements: 
forms :  4 ; 

items:  25  each;  and 
taxonomies:  5  (3, 5, 9, 7,1  items  in  taxonomy  1  to  5). 
Settings : 

CAT-valueS:  0.05,  0.1,  0.5; 

penalties:  1,  10,  100; 

PARAWEI:  300;  and 
item  pool:  327  items. 

Numerical  Results : 

objective  function  value  (lower  bound) :  2,006.71; 

objective  function  value  (best  solution):  7,278.24; 
integrality  gap:  7.3  %;  and 
runtime  (seconds):  50,000  (13.8  hours). 

Graphical  Results:  Figure  15  below. _ 


Figure  15:  Graphical  Results  for  the  MK-Test. 

This  figure  shows  results  obtained  for  the  MK-Test  with  information  on 
the  vertical  axis  and  the  percentiles  on  the  horizontal  axis.  Form  1  to 
form  4  are  the  information  curves  for  each  form. 


WK  -  Test  (Word  Knowledge) ; 


General  Requirements: 
forms :  2 ; 


items:  25  each;  and 

taxonomies:  2  (13,22  items  in  taxonomy  1  and  2) . 

Settings : 

CAT-valueS:  0.01,  0.05,  0.1; 

penalties:  0.000001,  1,  5; 


PARAWEI:  500;  and 
item  pool:  276  items. 

Numerical  Results : 

objective  function  value  (lower  bound):  3,588.32; 

objective  function  value  (best  solution):  5,188.42; 
integrality  gap:  39.2  %;  and 
runtime  (seconds):  13,934  (3.9  hours) 

Graphical  Results:  Figure  16  below. 


»  goal 
I-  form  1 
_y-form  2 


Figure  16:  Graphical  Results  for  the  WK-Test. 

This  figure  shows  results  obtained  for  the  WK-Test  with  information  on 
the  vertical  axis  and  the  percentiles  on  the  horizontal  axis.  Form  1  and 
form  2  are  the  information  curves  for  each  form. 
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C.  RESULTS  OF  THE  HEURISTIC  APPROACH 


The  objective  function  implemented  in  the  heuristic  is 
as  follows : 

^  ^  (PYpf  + 

P  f 

This  simplification  of  the  objective  function  previously 
used  (i.e.,  unweighted  deviations  and  no  parallel  subgoal) 
was  chosen  for  ease  of  computation. 

The  following  pages  display  the  objective  function 
values  per  repetition  (random  restart)  of  the  heuristic  as 
well  as  the  graph  for  the  best  solution  found  (Figures  17  to 
30)  - 

The  heuristic  algorithm  is  implemented  on  a  Pentium  166 
PC,  written  in  Standard  Pascal  [e.g.,  Silicon  Valley  Soft¬ 
ware  1991] .  Table  3  shows  the  runtimes  and  the  objective 
function  values. 


Test 

Objective  function 

value 

Repetitions 

Runtime 

(seconds) 

AR 

97.74 

100 

120 

AS 

230.77 

100 

150 

El 

227.10 

100 

120 

GS 

117.13 

100 

130 

MC 

47.80 

100 

250 

MK 

257.94 

100 

280 

WK 

280.68 

100 

160 

Tshls  3  :  Results  for  tests  assetiibled  with  the  Heuristic  Approach. 
As  the  runtimes  show,  the  heuristic  provides  results  very  quickly. 


29 


Figure  18:  Graphical  Results  for  the  AR-Test. 

This  figure  shows  results  obtained  for  the  AR-Test  with  information  on 
the  vertical  axis  and  the  percentiles  on  the  horizontal  axis.  Form  1  and 
form  2  are  the  information  curves  for  each  form. 
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General  Requirements: 
forms :  2 ; 

items:  25  each;  and 

taxonomies:  2  (11,  13  items  in  taxonomy  1  and  2) . 
Execution  Specifics: 

repetitions:  100;  and 
objective  function  value:  230.77. 


Figure  19:  Objective  function  values  for  each  Random  Restart. 

The  flat  line  indicates  the  minimum  value  of  the  best  solution  obtained 

30  _ _ _ _ _ _ 


Figure  20:  Graphical  Results  for  the  AS-Test. 

This  figure  shows  results  obtained  for  the  AS-Test  with  information  on 
the  vertical  axis  and  the  percentiles  on  the  horizontal  axis.  Form  1  and 
form  2  are  the  information  curves  for  each  form. 


El  -  Test  (Electronic  Information) ; 


General  Requirements: 


forms:  2; 

items:  20  each;  and 

taxonomies:  4  (10,4,2,4  items  in  taxonomy  1  to  4) 
Execution  Specifics: 

repetitions:  100;  and 

objective  function  value:  227.10. _ 


Figure  21:  Objective  Function  Values  for  each  Ran(iom  Restart. 

The  flat  line  indicates  the  tninimum  value  of  the  best  solution  obtained. 


Figure  22:  Graphical  Results  for  the  El-Test. 

This  figure  shows  results  obtained  for  the  El-Test  with  information  on 
the  vertical  axis  and  the  percentiles  on  the  horizontal  axis.  Form  1  and 
form  2  are  the  information  curves  for  each  form. 
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Figure  23:  Objective  Function  Values  for  each  Random  Restart. 

The  flat  line  indicates  the  minimum  value  of  the  best  solution  obtained. 


Figure  24:  Graphical  Results  for  the  GS-Test. 

This  figure  shows  results  obtained  for  the  GS-Test  with  information  on 
the  vertical  axis  and  the  percentiles  on  the  horizontal  axis.  Form  1  and 
form  2  are  the  information  curves  for  each  form. 
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General  Requirements : 
forms :  4 ; 

items:  25  each;  and 

taxonomies :  6  (11,2,2,2,4,4  items  in  taxonomy  1  to  6 ) . 
Execution  Specifics : 

repetitions:  100;  and 
objective  function  value:  47.8. 


Figure  25:  Objective  Function  Values  for  each  Random  Restart. 

The  flat  line  indicates  the  minimum  value  of  the  best  solution  obtained. 


Figure  26:  Graphical  Results  for  the  MC-Test. 

This  figure  shows  results  obtained  for  the  MC-Test  with  information  on 
the  vertical  axis  and  the  percentiles  on  the  horizontal  axis.  Form  1  to 
form  4  are  the  information  curves  for  each  form. 
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HK 


Test  (Mathematical  Knowledge 


General  Requirements : 
forms :  4 ; 

items:  25  each;  and 

taxonomies:  5  (3, 5, 9, 7,1  items  in  taxonomy  1  to  5). 
Execution  Specifics: 

repetitions:  100;  and 
objective  function  value:  257.94. 


Figure  27:  Objective  Function  Values  for  each  Random  Restart. 

The  flat  line  indicates  the  minimum  value  of  the  best  solution  obtained. 


Figure  28:  Graphical  Results  for  the  MK-Test, 

This  figure  shows  results  obtained  for  the  MK-Test  with  information  on 
the  vertical  axis  and  the  percentiles  on  the  horizontal  axis.  Form  1  to 
form  4  are  the  information  curves  for  each  form. 
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WK  -  Test  (Word  Knowledge) ; 


General  Requirements : 
forms :  2 ; 

items:  35  each;  and 

taxonomies:  2  (13,22  items  in  taxonomy  1  and  2). 
Execution  Specifics: 

repetitions:  100;  and 
objective  function  value:  280.68. 


■rrr^.i  th  objfctval 
- m  in 


Figure  29:  Objective  Function  Values  for  each  Random  Restart. 

The  flat  line  indicates  the  minimum  value  of  the  best  solution  obtained. 


^-r^goal 
—@3— form  1 
form  2 


Figure  30:  Graphical  Results  for  the  WK-Test. 

This  figure  shows  results  obtained  for  the  WK~Test  with  information  on 
the  vertical  axis  and  the  percentiles  on  the  horizontal  axis.  Form  1  and 
form  2  are  the  information  curves  for  each  form. 
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D.  DISCUSSION  OF  THE  RESULTS 


The  optimization  approach  yields  good  results  for  the 
form  assembly.  The  assembled  forms  for  five  out  of  the  seven 
tests  have  information  curves  (form  curves)  that  are  very 
close  to  the  goal  curve  and  parallel  to  each  other.  In  the 
El-  and  WK-Test  the  form  curves  do  not  reach  the  goal  curve 
in  the  lower  half  of  the  percentile  range.  Improving  these 
forms  by  changing  the  weight  of  the  parallel  sxabgoal  for  the 
El-  and  WK-Test  to  zero  does  not  improve  the  shape  of  the 
form  curves.  Increasing  the  weight  for  the  subgoal  yields 
marginally  more  parallel  forms,  but  increases  the  overall 
distance  to  the  goal  curve  much  more .  Changing  the  bounds 
for  the  deviation  variables  has  little  effect.  Discussions 
with  DMDC  indicate  the  item  pools  for  the  El-  and  WK-Test 
are  known  to  be  "weak"  since  in  their  opinion,  too  many 
items  were  extracted  for  Computer  Adaptive  Testing.  (See 
Wainer  [1990]  for  a  description  of  this  relatively  new 
method  of  testing.)  They  are  working  to  restock  these  item 
pools . 

The  heuristic  yields  good  results  for  the  TkR- ,  AS-  and 
MC-Test.  Results  for  GS-Test  are  not  very  parallel  in  the 
higher  percentile  range  and  results  for  the  MK-Test  are  not 
very  parallel  in  the  lower  percentile  range.  The  form  curves 
of  the  El-  and  WK-Tests  indicate  the  same  deficiency  in  the 
item  pool  in  the  lower  half  of  the  percentile  range  as 
mentioned  above.  The  number  of  repetitions  has  been  in¬ 
creased  to  1,000  in  the  AS-  and  El-Test  •  in  order  to  see, 
whether  the  heuristic  results  can  be  improved.  The  objective 
function  value  decreased  from  230  to  225  in  the  AS-Test  and 
only  from  227  to  226  in  the  El-Test. 
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V.  USING  BOTH  OPTIMIZATION  AND  HEURISTIC  APPROACHES 

A.  USING  THE  HEURISTIC  SOLUTION  AS  A  BOUND 

Table  4  summarizes  a  direct  comparison  of  the  objective 
function  values  where  the  heuristic  solutions  are  converted 
to  the  objective  function  of  the  optimization  model. 


Test 

Optimization  Objective 

function  value 

Heuristic  Objective 

fianction  value 

AR 

932.33 

1,840.57 

AS 

2,862.73 

4,938.76 

El 

9,561.94 

10,676.62 

GS 

8,433.03 

11,830.96 

MC 

1,187.11 

3,377.86 

MK 

7,278.24 

72,095.93 

WK 

5,188.42 

17,883.56 

Table  4 :  Comparison  of  the  Results . 

This  table  provides  the  objective  function  values  for  both  the  best 
heuristic  solution  and  the  best  solution  obtained  solving  the  opti¬ 
mization  model  using  the  optimization  model's  objective  function. 

The  optimization  approach  yields  smaller  objective  function 
values  than  the  heuristic  as  would  be  expected  when  using 
the  optimization  model's  objective  function  as  an  eva¬ 
luation.  However,  it  is  surprising  that  the  differences  are 
so  great  when  the  graphical  results  look  similar.  For  the 
AR-Test,  the  heuristic  approach  in  the  percentile  range  20 
to  50  is  not  as  parallel  as  in  the  optimization  solution  and 
this  difference  is  responsible  for  nearly  doubling  the 
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objective  function  value.  This  is  similar  in  the  AS -Test, 
where  form  2  is  constantly  below  form  1.  For  the  MK-  and  WK- 
Tests  the  corresponding  objective  function  value  is  9.9  and 
3.5  times  higher  than  the  optimal  result.  The  heuristic's 
solution  having  a  higher  objective  function  value  for  the 
MK-Test  (Figure  15  and  Figure  28)  is  caused  by  the  parallel 
gap  between  form  one  and  the  three  other  forms  in  the  lower 
percentiles  combined  with  a  high  weight  for  the  parallel 
subgoal.  In  the  WK-Test  the  alternating  behavior  of  the 
forms  around  each  other  in  Figure  16  is  similar  to  the 
heuristic  solution  (Figure  30) .  However,  there  is  an  obvious 
dominance  of  form  one  to  form  two  in  the  lower  percentile 
range.  The  heuristic  solution  for  the  MC-Test  has  a  higher 
value  than  that  of  the  optimization  solution,  however,  the 
graphical  result  of  the  heuristic  looks  much  better  than  the 
optimization.  This  is  most  likely  due  to  the  cancellation 
effect  of  positive  and  negative  distances  in  Figure  14. 

Using  the  heuristic  solution  as  an  upper  bound  for  the 
objective  function  value  when  solving  it  using  GAMS  anH  OSL 
yields  better  results  in  almost  all  cases  as  shown  in  Table 
5.  Table  5  shows  the  MC-Test  is  an  exception  since  the  best 
solution  with  the  heuristic  bound  is  worse  than  without  it. 
While  this  may  happen  due  to  OSL's  branching  choice  within 
its  branch  and  bound  enumeration,  having  a  bound  should  help 
in  almost  all  cases. 
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B.  CONCLUSIONS 


This  thesis  demonstrates  how  using  a  linear  mixed 
integer  goal  program  can  support  DMDC's  ‘form  assembly- 
process .  The  developed  heuristic  is  a  good  supplement  that 
can  be  used  with  the  optimization  approach  described.  In 
some  cases  the  heuristic  solution  yields  good  upper  bounds 
for  the  optimization  that  can  decrease  the  confutation  time. 

C.  RECOMMENDATIONS 

The  optimization  model  should  be  extended  to  capture 
the  other  three  ASYAB-Tests. 

This  heuristic  algorithm  should  be  considered  a  pro¬ 
totype.  Experiments  should  be  conducted  with  the  objective 
function  to  find  the  most  useful  expression.  While  changing 
the  objective  function  to  match  that  currently  implemented 
in  the  optimization  model  would  be  a  natural  first  step, 
experimentation  should  be  more  expansive.  The  heuristic  can 
easily  accomodate  a  nonlinear  objective  function  (an  option 
not  available  in  integer  linear  programming) . 

Further  research  can  also  be  conducted  to  implement  a 
heuristic  for  Computer  Adaptive  Testing. 
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